Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added interoperability with arrow-schema #1442

Merged
merged 1 commit into from
Mar 26, 2023

Conversation

tustvold
Copy link
Contributor

Part of #1429

Adds conversions to/from arrow-schema

),
DataType::Decimal(precision, scale) => Self::Decimal128(precision as _, scale as _),
DataType::Decimal256(precision, scale) => Self::Decimal256(precision as _, scale as _),
DataType::Extension(_, d, _) => (*d).into(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why arrow2 chose to represent extension types as an explicit data type, as opposed to just field metadata?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is so that the Array has the logical type on it. It allows the user of a Box<dyn Array> to use .data_type() to perform the downcast and have the necessary information to build the extension.

For example, polars uses it to store arbitrary Python objects on the type. In theory this could be kept in the Field's metadata.

Pyarrow does the same: https://arrow.apache.org/docs/python/generated/pyarrow.ExtensionType.html

}
DataType::Decimal128(precision, scale) => Self::Decimal(precision as _, scale as _),
DataType::Decimal256(precision, scale) => Self::Decimal256(precision as _, scale as _),
DataType::RunEndEncoded(_, _) => panic!("Run-end encoding not supported by arrow2"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could instead implement TryFrom, but it seemed a touch excessive for a single error case

@@ -160,6 +160,126 @@ pub enum DataType {
Extension(String, Box<DataType>, Option<String>),
}

#[cfg(feature = "arrow")]
impl From<DataType> for arrow_schema::DataType {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion uses truncating as _ casts, as provided the type is valid these cannot overflow / underflow. I'm not sure having a sensible behaviour for things like negative size FixedSizeList is necessary, garbage in garbage out was my thoughts

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Mar 26, 2023
@jorgecarleitao jorgecarleitao changed the title Interoperability with arrow-schema Added interoperability with arrow-schema Mar 26, 2023
@jorgecarleitao jorgecarleitao merged commit 33b82ab into jorgecarleitao:main Mar 26, 2023
@jorgecarleitao
Copy link
Owner

Thank you @tustvold

ritchie46 pushed a commit to ritchie46/arrow2 that referenced this pull request Mar 29, 2023
ritchie46 pushed a commit to ritchie46/arrow2 that referenced this pull request Apr 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants